In this report, we consider a non-optimal subsampling approach when fitting a PIM on a large dataset. We first simulate a dataset (sample size, \(N = 250.000\)) under the normal linear model. We consider several models in which we increase the complexity. Then we apply a non-optimal subsampling algorithm. We record the estimated \(\beta\) parameters and the time needed (on the HPC infastructure) to estimate these. In the first section, we briefly explain the algorithm. Next we give the simlation details. Finally, we look at the results.
The main problem when fitting a Probabilistic Index Model (PIM) on a large dataset is the time needed to estimate the parameters. In order to limit the computational time, we first explore a non-optimal subsampling approach. Consider a set of subsampling probabilites \(\pi_i, i = 1,...,n\) assigned to all data points. We define \(\boldsymbol{\pi} = \{\pi_i\}_{i=1}^n\). Next we apply the following algorithm:
In this report, we consider the nonuniform subsampling probability \(\boldsymbol{\pi}^{\text{UNI}} = \{\pi_i = n^{-1}\}_{i=1}^n\) in step 1.
We first generate data under the linear model:
\[ Y_i = \alpha X_i + \varepsilon_i \] with \(i = 1,...,n\) and \(\varepsilon_i\) are i.i.d. \(N({0,\sigma^2)}\) with \(\sigma^2 = 1\). We consider a sample size \(n = 250.000\). The predictor (X) is uniformly spaced between \([0.1,1]\). The value \(\alpha\) is set to 5. Using the probit link function, the corresponding PIM can be expressed as:
\[
\Phi^{-1}[P(Y\preceq Y'|X,X')] = \beta(X' - X)
\] where \(\beta = \alpha/\sqrt{2\sigma^2}\). Hence the true value for \(\beta\) equals 3.5355339.
For the second model, we choose \(\alpha = 10\), a predictor (X) uniformly spaced between \([0.1,10]\) and \(\sigma^2 = 25\). Hence, the true value for \(\beta =\) 1.4142136.
The third model is a multiple regression model in which we take parameters from the analysis done in Przybylski and Weinstein (2017). In their study, the authors used linear regression to model the effect of (among other variables) smartphone usage in the weekdays and weekends on self-reported mental well being. To control for confounding effects, they included variables for sex, whether the participant was located in economic deprived areas and the ethnic background (minority group yes/no) in the model. The full model is given as:
\[ Y_i = \mu + \alpha_1 X_i + \gamma_1 Z_{1i} + \gamma_2 Z_{2i} + \gamma_3 Z_{31i} + \varepsilon_i \]
Based on a linear regression, we find the regression parameters, the proportions of the covariates (\(Z_j\)), the range of the predictor (\(X\), smartphone usage during weekdays measured on a Likert scale from 0-7) and the spread of the observations (\(\sigma = 9.51\)). These parameters are then used to generate normally i.i.d. data.
dataWB <- read.csv(file = paste(wd, '/OSF_Przybylski2017/data.csv', sep = ''))
# Smartphone screen usage during weekdays, corrected for sex, economic area and minority status
dataWB %>% select(mwbi, sp_wd, male, minority, deprived) %>%
filter(complete.cases(.)) %>%
mutate(sp_wd_sq = sp_wd**2) %>%
lm(mwbi ~ sp_wd + male + minority + deprived, data = .) %>%
summary(.)
Call:
lm(formula = mwbi ~ sp_wd + male + minority + deprived, data = .)
Residuals:
Min 1Q Median 3Q Max
-37.356 -5.675 0.461 6.146 26.756
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 46.71698 0.05967 782.86 < 2e-16 ***
sp_wd -0.43171 0.01183 -36.48 < 2e-16 ***
male 4.55003 0.05514 82.52 < 2e-16 ***
minority 0.30486 0.06514 4.68 2.87e-06 ***
deprived -0.45068 0.05578 -8.08 6.55e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.11 on 116625 degrees of freedom
Multiple R-squared: 0.08225, Adjusted R-squared: 0.08221
F-statistic: 2613 on 4 and 116625 DF, p-value: < 2.2e-16
# Spread of data
sd(dataWB$mwbi, na.rm = TRUE)
[1] 9.51486
# proportion of gender, minority and deprived
mean(dataWB$male)
[1] 0.475819
mean(dataWB$deprived)
[1] 0.4348
mean(dataWB$minority)
[1] 0.2407609
Then for each simulation, we generate a dataset through:
# Sample size
n <- 250000
# Regression parameters
alpha_0 <- 46.72
alpha_1 <- -0.43
# Control variables: sex (0 = female, 1 = male)
sex <- c(0,1)
alpha_sex <- 4.55
# Economic area (deprived = 1, otherwise 0)
econArea <- c(0,1)
alphaEcon <- -0.45
# Ethnic background (white = 0, otherwise 1)
ethnic <- c(0,1)
alpha_ethnic <- 0.30
# Predictor is a discrete scale from 0 --> 7
u <- 7
# Sigma based on dataset
sigma <- 9.51
# Seed: note, in simulations, this seed depends on ID of simulation!
set.seed(123)
# Predictors: proportions come from observed proportions in study of mental well being
X_smartph_hrs <- sample(x = c(0:u), size = n, replace = TRUE)
X_sex <- sample(x = sex, size = n, replace = TRUE, prob = c(1-0.4758, 0.4758))
X_econArea <- sample(x = econArea, size = n, replace = TRUE, prob = c(1-0.4348, 0.4348))
X_ethnic <- sample(x = ethnic, size = n, replace = TRUE, prob = c(1-0.2408, 0.2408))
# Observed data
Y <- alpha_0 + alpha_1*X_smartph_hrs + alpha_sex*X_sex +
alphaEcon*X_econArea + alpha_ethnic*X_ethnic +
rnorm(n = n, mean = 0, sd = sigma)
The true value for \(\beta =\) -0.0319722.
For comparisson, we plot the distribution of mental well being in the dataset (left) and \(n = 250.000\) simulated datapoints (right).
par(mfrow = c(1,2))
hist(dataWB$mwbi, main = "Mental well being in Przybylski and Weinstein (2017)", xlab = "")
hist(Y, main = "Simulated mental well being", xlab = "")
Note that we did not introduce the observed skeweness in the real dataset.
For the fourth model, we choose \(\alpha = 1\), a predictor (X) uniformly spaced between \([0.1,10]\) and \(\sigma^2 = 25\). Hence, the true value for \(\beta =\) 0.1414214.
We vary the amount of iterations B as well as the number of selected samples K in each iteration to find the optimal balance in bias and computation time. Both B and K range from 10 to 1000. We consider 10 levels between these points for each parameter. Hence we have \(10 \times 10 = 100\) scenarios.
B K ... B K
1 10 10 ... 450 1000
2 120 10 ... 560 1000
3 230 10 ... 670 1000
4 340 10 ... 780 1000
5 450 10 ... 890 1000
6 560 10 ... 1000 1000
We simulate each combination of B and K for 1000 times.
Parameters:
u <- 1
alpha <- 5
sigma <- 1
trueBeta <- alpha/(sqrt(2) * sigma)
# Model (called SCEN in script)
SCEN <- 1
Start with estimated \(\beta\):
# load in estimated beta values
for(i in 1:nsim){
if(i %in% progress) print(paste0("At ", i/nsim*100, "%"))
# Values in one simulation: estimated beta values averaged within the sampling algorithm
ValSim <- try(read.table(file = paste0(datLoc, 'uni_beta_vector_simID_', i, '_SCEN_', SCEN, '.txt'),
col.names = c("beta", "sVariance", "K", "nRSloops", "TrueBeta"),
header = TRUE) %>% tbl_df() %>%
group_by(K, nRSloops, TrueBeta) %>% summarise(avBeta = mean(beta)) %>%
mutate(sim = i), silent = TRUE)
if(class(ValSim) == "try-error"){
print(i);next
}
# Add to data frame with all simulations
valuesAllSim <- bind_rows(valuesAllSim, ValSim)
}
Same for computational time:
# load in computational time
for(i in 1:nsim){
if(i %in% progress) print(paste0("At ", i/nsim*100, "%"))
# Values in one simulation: computational time
TimeSim <- try(read.table(file = paste0(datLoc, 'uni_time_vector_simID_', i, '_SCEN_', SCEN, '.txt'),
col.names = c("minutes"), header = TRUE) %>% tbl_df() %>%
bind_cols(., pairs) %>%
mutate(sim = i), silent = TRUE)
if(class(TimeSim) == "try-error"){
print(i);next
}
# Add to data frame with all simulations
timeAllSim <- bind_rows(timeAllSim, TimeSim)
}
Quick summary:
tbl_df(valuesAllSim)
# A tibble: 100,000 × 4
K nRSloops avBeta sim
<int> <int> <dbl> <int>
1 10 10 4.100895 1
2 10 120 4.162509 1
3 10 230 4.109357 1
4 10 340 4.143610 1
5 10 450 4.200134 1
6 10 560 4.203660 1
7 10 670 4.236758 1
8 10 780 4.451395 1
9 10 890 4.413864 1
10 10 1000 4.392884 1
# ... with 99,990 more rows
group_by(valuesAllSim, K, nRSloops) %>% summarise(avBeta = mean(avBeta)) %>% summary()
K nRSloops avBeta
Min. : 10 Min. : 10 Min. :3.540
1st Qu.: 230 1st Qu.: 230 1st Qu.:3.541
Median : 505 Median : 505 Median :3.545
Mean : 505 Mean : 505 Mean :3.632
3rd Qu.: 780 3rd Qu.: 780 3rd Qu.:3.556
Max. :1000 Max. :1000 Max. :4.388
Plotting the estimated \(\beta\) parameters for each simulation through facets on the number of iterations (B).
MSEtable_tmp <- group_by(valuesAllSim, K, nRSloops) %>% mutate(SE = (avBeta - TrueBeta)^2) %>%
summarise(MSE = mean(SE, na.rm = TRUE))
MSEtable <- matrix(MSEtable_tmp$MSE, ncol = length(nRSloops_vec), byrow = TRUE)
colnames(MSEtable) <- paste('B = ', nRSloops_vec, sep = '')
rownames(MSEtable) <- paste('K = ', K_vec, sep = '')
options( scipen = 6 )
knitr::kable(MSEtable)
| B = 10 | B = 120 | B = 230 | B = 340 | B = 450 | B = 560 | B = 670 | B = 780 | B = 890 | B = 1000 | |
|---|---|---|---|---|---|---|---|---|---|---|
| K = 10 | 1.2155397 | 0.7616045 | 0.7447556 | 0.7352143 | 0.7280799 | 0.7218844 | 0.7196143 | 0.7173493 | 0.7126988 | 0.7104787 |
| K = 120 | 0.0150656 | 0.0029115 | 0.0023595 | 0.0021759 | 0.0020534 | 0.0019846 | 0.0019494 | 0.0019144 | 0.0018901 | 0.0018665 |
| K = 230 | 0.0067774 | 0.0010084 | 0.0007558 | 0.0006425 | 0.0006106 | 0.0005776 | 0.0005625 | 0.0005529 | 0.0005440 | 0.0005369 |
| K = 340 | 0.0044479 | 0.0006562 | 0.0004525 | 0.0003832 | 0.0003465 | 0.0003166 | 0.0003113 | 0.0003044 | 0.0002980 | 0.0002972 |
| K = 450 | 0.0032917 | 0.0004495 | 0.0003090 | 0.0002659 | 0.0002391 | 0.0002240 | 0.0002182 | 0.0002125 | 0.0002078 | 0.0002052 |
| K = 560 | 0.0027479 | 0.0003494 | 0.0002533 | 0.0002176 | 0.0001903 | 0.0001745 | 0.0001679 | 0.0001623 | 0.0001581 | 0.0001561 |
| K = 670 | 0.0022307 | 0.0002729 | 0.0002076 | 0.0001715 | 0.0001559 | 0.0001479 | 0.0001418 | 0.0001379 | 0.0001353 | 0.0001318 |
| K = 780 | 0.0019095 | 0.0002638 | 0.0001825 | 0.0001569 | 0.0001397 | 0.0001354 | 0.0001276 | 0.0001242 | 0.0001205 | 0.0001200 |
| K = 890 | 0.0017163 | 0.0002233 | 0.0001612 | 0.0001383 | 0.0001234 | 0.0001150 | 0.0001103 | 0.0001081 | 0.0001063 | 0.0001025 |
| K = 1000 | 0.0016339 | 0.0002003 | 0.0001443 | 0.0001178 | 0.0001109 | 0.0001047 | 0.0000996 | 0.0000965 | 0.0000938 | 0.0000919 |
Plotting the computational time.
Note: this is run on the HPC which will runs faster than regular PCs.
Can we combine time and bias in one graph? Maybe create a summary measure which involves a penalty term for the required time to estimate the parameters.
The figures below show the QQ-plots of \(\widehat\beta\) associated with model 1 when having 500 samples in one iteration and applying 200 iterations.
for(j in 1:length(nRSloops_vec)){
K_temp <- K_vec[j]
B_temp <- nRSloops_vec[j]
assign(paste0("Plot", j),
valuesAllSim %>% filter(K == K_temp & nRSloops == B_temp) %>%
select(avBeta) %>% unlist(.) %>%
as.numeric() %>% gg_qq(x = ., title = paste0('K = ', K_temp, ' and B = ', B_temp))
)
}
cowplot::plot_grid(Plot1, Plot2, Plot3, Plot4, Plot5, Plot6, labels = c("A", "B", "C", "D", "E", "F"), ncol = 3)
cowplot::plot_grid(Plot7, Plot8, Plot9, Plot10, labels = c("G", "H", "I", "J"), ncol = 3)
In this section, we calculate for every iteration (B) the 95% confidence interval. Within each iteration, we can check its nominal coverage level and then average over all simulations. Note, that this is just equal to running multiple simulations in which you check the asymptotic properties of the sandwich estimator and not related to our research question!
# Empty data frame
valuesAvgCov <- data.frame() %>% tbl_df()
# load in estimated beta values and calculate coverage
for(i in 1:nsim){
# Values in one simulation: estimated beta values averaged within the sampling algorithm
ValSim <- try(read.table(file = paste0(datLoc, 'uni_beta_vector_simID_', i, '_SCEN_', SCEN, '.txt'),
col.names = c("beta", "sVariance", "K", "nRSloops", "TrueBeta"), header = TRUE) %>%
tbl_df(), silent = TRUE)
if(class(ValSim)[1] == "try-error"){
print(i);next
}
# Calculate coverage per simulation: estimate +/- (1.96 * sqrt(sandwichVariance)) for 95% CI
WithCoverage <- mutate(ValSim, CIlow = beta - (qnorm(p = ((1 - level)/2), lower.tail = FALSE) * sqrt(sVariance))) %>%
mutate(CIup = beta + (qnorm(p = ((1 - level)/2), lower.tail = FALSE) * sqrt(sVariance))) %>% rowwise() %>%
mutate(coverage = ifelse(TrueBeta >= CIlow && TrueBeta <= CIup, 1, 0))
# Average coverage per simulation
AvgCov <- WithCoverage %>% ungroup() %>% group_by(K, nRSloops) %>%
summarise(AvgCov = mean(coverage, na.rm = TRUE)) %>%
mutate(sim = i)
# Add to data frame with all simulations
valuesAvgCov <- bind_rows(valuesAvgCov, AvgCov)
}
Now we proceed to plot the average CI level per B and K.
For more detail, we refer to the table below with the average emperical coverage per B (columns) and K (rows).
| B = 10 | B = 120 | B = 230 | B = 340 | B = 450 | B = 560 | B = 670 | B = 780 | B = 890 | B = 1000 | |
|---|---|---|---|---|---|---|---|---|---|---|
| K = 10 | 0.811 | 0.778 | 0.760 | 0.760 | 0.767 | 0.768 | 0.769 | 0.761 | 0.763 | 0.763 |
| K = 120 | 0.940 | 0.938 | 0.936 | 0.936 | 0.936 | 0.936 | 0.935 | 0.935 | 0.934 | 0.934 |
| K = 230 | 0.960 | 0.942 | 0.942 | 0.940 | 0.941 | 0.941 | 0.941 | 0.941 | 0.941 | 0.940 |
| K = 340 | 0.936 | 0.945 | 0.941 | 0.944 | 0.944 | 0.944 | 0.943 | 0.944 | 0.944 | 0.943 |
| K = 450 | 0.940 | 0.948 | 0.948 | 0.949 | 0.945 | 0.945 | 0.944 | 0.945 | 0.945 | 0.946 |
| K = 560 | 0.952 | 0.949 | 0.944 | 0.946 | 0.947 | 0.948 | 0.948 | 0.947 | 0.946 | 0.946 |
| K = 670 | 0.940 | 0.940 | 0.946 | 0.946 | 0.946 | 0.946 | 0.946 | 0.946 | 0.946 | 0.947 |
| K = 780 | 0.944 | 0.947 | 0.948 | 0.950 | 0.949 | 0.950 | 0.950 | 0.949 | 0.949 | 0.949 |
| K = 890 | 0.960 | 0.947 | 0.950 | 0.948 | 0.949 | 0.949 | 0.949 | 0.948 | 0.950 | 0.949 |
| K = 1000 | 0.956 | 0.953 | 0.952 | 0.953 | 0.951 | 0.950 | 0.951 | 0.951 | 0.951 | 0.950 |
Now we turn to the more interesting question. How can we create confidence intervals around the average \(\beta\) PIM estimate obtained in one iteration in the non optimal resampling procedure? We suggest to use a bootstrap procedure. Let us try with the bias-corrected and accelerated bootstrap (BCa) which adjusts both for bias and skeweness. Note, this might introduce extra computation time. To get the results here, we run all simulations in parallel. Note that we defined the function to calculate the coverages (BootParCI) earlier.
# Detect and start the workers
P <- detectCores(logical = FALSE) # physical cores
cl <- makeCluster(P)
# Initialize them with the libraries
clusterEvalQ(cl, library(tidyverse))
clusterEvalQ(cl, library(boot))
# Run in parallel and append results
boot_par_results <- clusterApply(cl, 1:nsim, fun = BootParCI, datLoc = datLoc, SCEN = SCEN)
boot_CI_coverage <- do.call(rbind, boot_par_results)
# Stop the workers
stopCluster(cl)
| B = 10 | B = 120 | B = 230 | B = 340 | B = 450 | B = 560 | B = 670 | B = 780 | B = 890 | B = 1000 | |
|---|---|---|---|---|---|---|---|---|---|---|
| K = 10 | 0.639 | 0.001 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| K = 120 | 0.877 | 0.724 | 0.559 | 0.434 | 0.339 | 0.261 | 0.194 | 0.146 | 0.125 | 0.088 |
| K = 230 | 0.896 | 0.844 | 0.746 | 0.675 | 0.583 | 0.512 | 0.458 | 0.395 | 0.361 | 0.322 |
| K = 340 | 0.897 | 0.835 | 0.783 | 0.714 | 0.667 | 0.616 | 0.556 | 0.524 | 0.475 | 0.424 |
| K = 450 | 0.908 | 0.871 | 0.818 | 0.744 | 0.692 | 0.658 | 0.613 | 0.578 | 0.553 | 0.518 |
| K = 560 | 0.886 | 0.875 | 0.814 | 0.754 | 0.718 | 0.674 | 0.659 | 0.627 | 0.597 | 0.572 |
| K = 670 | 0.895 | 0.890 | 0.802 | 0.766 | 0.721 | 0.671 | 0.643 | 0.619 | 0.595 | 0.564 |
| K = 780 | 0.897 | 0.863 | 0.812 | 0.761 | 0.722 | 0.662 | 0.648 | 0.609 | 0.586 | 0.567 |
| K = 890 | 0.886 | 0.861 | 0.808 | 0.746 | 0.709 | 0.663 | 0.640 | 0.611 | 0.584 | 0.568 |
| K = 1000 | 0.869 | 0.864 | 0.814 | 0.763 | 0.715 | 0.676 | 0.651 | 0.622 | 0.588 | 0.558 |
Quite bad coverages!
What happens if we consider the PIM \(\beta\) estimates as K i.i.d. datapoints and construct a \(t\)-distributed CI interval? Using the standard deviation of the estimated \(\beta\) parameters, we get:
\[ [\bar{\beta} \pm t_{\alpha/2, K-1} sd(\beta)/\sqrt{(K - 1)}] \] Note, this is not really correct. But we just want to see what effect it has on the emperical coverage.
norm_tAvg <- c()
# Read in data (again, sigh)
for(i in 1:nsim){
ValSim <- read.table(file = paste0(datLoc, 'uni_beta_vector_simID_', i, '_SCEN_', SCEN, '.txt'),
col.names = c("beta", "sVariance", "K", "nRSloops", "TrueBeta"), header = TRUE) %>%
tbl_df()
# CI using SD of data and normal quantiles and using t-quantiles
# Formula = sd/sqrt(n) for some strange reason (I expected sd of beta being equal to se)
norm_tmp <- ValSim %>% group_by(K, nRSloops) %>%
mutate(
CIlow = beta - (qnorm(0.025, lower.tail = FALSE) * (sd(beta)/(sqrt(K - 1)))),
CIup = beta + (qnorm(0.025, lower.tail = FALSE) * (sd(beta)/(sqrt(K - 1)))),
type = 'norm') %>% group_by(K, nRSloops, TrueBeta, type) %>%
summarise(AvgBeta = mean(beta, na.rm = TRUE),
AvgLow = mean(CIlow, na.rm = TRUE),
AvgUp = mean(CIup, na.rm = TRUE)) %>% ungroup() %>% rowwise() %>%
mutate(coverage_ind = ifelse(TrueBeta >= AvgLow && TrueBeta <= AvgUp, 1, 0),
sim = i)
t_tmp <- ValSim %>% group_by(K, nRSloops) %>%
mutate(
CIlow = beta - (qt(0.025, df = (K - 1), lower.tail = FALSE) * (sd(beta)/(sqrt(K - 1)))),
CIup = beta + (qt(0.025, df = (K - 1), lower.tail = FALSE) * (sd(beta)/(sqrt(K - 1)))),
type = 't') %>% group_by(K, nRSloops, TrueBeta, type) %>%
summarise(AvgBeta = mean(beta, na.rm = TRUE),
AvgLow = mean(CIlow, na.rm = TRUE),
AvgUp = mean(CIup, na.rm = TRUE)) %>% ungroup() %>% rowwise() %>%
mutate(coverage_ind = ifelse(TrueBeta >= AvgLow && TrueBeta <= AvgUp, 1, 0),
sim = i)
# Construct general CI by averaging endpoints
norm_tAvg <- bind_rows(norm_tAvg,norm_tmp, t_tmp)
}
# Plot
norm_tAvg %>% filter(type == 't' & K == 1000 & nRSloops == 1000) %>%
slice(round(seq(1,nsim,length.out = 50),0)) %>%
ggplot(., aes(x = TrueBeta)) + geom_vline(aes(xintercept = TrueBeta)) +
geom_point(aes(x = AvgBeta, y = sim)) +
geom_segment(aes(x = AvgLow, xend = AvgUp, y = sim, yend = sim)) +
scale_x_continuous("beta") + scale_y_continuous("simulation") +
ggtitle("95% CI around beta PIM estimate")
resTable_tmp <- norm_tAvg %>% filter(type == 't') %>% group_by(K,nRSloops) %>% summarise(EC = mean(coverage_ind))
resTable <- matrix(resTable_tmp$EC, ncol = length(nRSloops_vec), byrow = TRUE)
colnames(resTable) <- paste('B = ', nRSloops_vec, sep = '')
rownames(resTable) <- paste('K = ', K_vec, sep = '')
knitr::kable(resTable)
| B = 10 | B = 120 | B = 230 | B = 340 | B = 450 | B = 560 | B = 670 | B = 780 | B = 890 | B = 1000 | |
|---|---|---|---|---|---|---|---|---|---|---|
| K = 10 | 0.908 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| K = 120 | 0.381 | 0.742 | 0.829 | 0.863 | 0.897 | 0.915 | 0.934 | 0.937 | 0.946 | 0.956 |
| K = 230 | 0.309 | 0.686 | 0.748 | 0.804 | 0.828 | 0.850 | 0.869 | 0.870 | 0.872 | 0.873 |
| K = 340 | 0.269 | 0.603 | 0.681 | 0.720 | 0.750 | 0.781 | 0.788 | 0.790 | 0.792 | 0.809 |
| K = 450 | 0.240 | 0.567 | 0.651 | 0.674 | 0.707 | 0.729 | 0.735 | 0.738 | 0.731 | 0.739 |
| K = 560 | 0.196 | 0.509 | 0.591 | 0.618 | 0.674 | 0.674 | 0.708 | 0.714 | 0.709 | 0.724 |
| K = 670 | 0.190 | 0.489 | 0.562 | 0.603 | 0.625 | 0.629 | 0.648 | 0.659 | 0.658 | 0.676 |
| K = 780 | 0.159 | 0.452 | 0.532 | 0.576 | 0.582 | 0.596 | 0.606 | 0.612 | 0.611 | 0.621 |
| K = 890 | 0.158 | 0.422 | 0.485 | 0.507 | 0.543 | 0.548 | 0.566 | 0.583 | 0.579 | 0.593 |
| K = 1000 | 0.138 | 0.395 | 0.467 | 0.530 | 0.527 | 0.534 | 0.548 | 0.562 | 0.569 | 0.564 |
Essentially, the non-optimal subsampling algorithm can also be consider as bootstrapping m-out-of n, with m \(<<\) n. In our case, m corresponds to K the subsample size.
However, in the bootstrap algorithm, it is essentially to scale the CI in order to match the variability of the full dataset. This scaling is done by multiplying the standard error of the estimator (\(se(\hat\theta_n)\)) through:
\[ se(\hat\theta_n) \approx se(\theta^*_k) \times \sqrt{\frac{K}{n}} \] In which \(se(\theta^*_k)\) is equal to the standard deviation of the sampling distribution for each iteration B.
We shall try this here.
scaledSD <- data.frame() %>% tbl_df()
# Read in data
for(i in 1:nsim){
ValSim <- read.table(file = paste0(datLoc, 'uni_beta_vector_simID_', i, '_SCEN_', SCEN, '.txt'),
col.names = c("beta", "sVariance", "K", "nRSloops", "TrueBeta"), header = TRUE) %>%
tbl_df()
# CI using SD of beta with scaling
sdScale_tmp <- ValSim %>% group_by(K, nRSloops, TrueBeta) %>%
summarise(AvgBeta = mean(beta, na.rm = TRUE),
sdBeta = sd(beta, na.rm = TRUE)) %>%
mutate(CIlow = AvgBeta - (qnorm(0.025, lower.tail = FALSE) * sdBeta * sqrt(K/n)),
CIup = AvgBeta + (qnorm(0.025, lower.tail = FALSE) * sdBeta * sqrt(K/n)),
type = 'scaledSD') %>%
ungroup() %>% rowwise() %>%
mutate(coverage_ind = ifelse(TrueBeta >= CIlow && TrueBeta <= CIup, 1, 0),
sim = i)
# Collect the values over all simulations
scaledSD <- bind_rows(scaledSD,sdScale_tmp)
}
scaledSDTable_tmp <- scaledSD %>% group_by(K,nRSloops) %>% summarise(EC = mean(coverage_ind))
scaledSDTable <- matrix(scaledSDTable_tmp$EC, ncol = length(nRSloops_vec), byrow = TRUE)
colnames(scaledSDTable) <- paste('B = ', nRSloops_vec, sep = '')
rownames(scaledSDTable) <- paste('K = ', K_vec, sep = '')
knitr::kable(scaledSDTable)
| B = 10 | B = 120 | B = 230 | B = 340 | B = 450 | B = 560 | B = 670 | B = 780 | B = 890 | B = 1000 | |
|---|---|---|---|---|---|---|---|---|---|---|
| K = 10 | 0.012 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| K = 120 | 0.092 | 0.172 | 0.145 | 0.113 | 0.091 | 0.065 | 0.054 | 0.041 | 0.035 | 0.029 |
| K = 230 | 0.128 | 0.334 | 0.363 | 0.353 | 0.348 | 0.340 | 0.340 | 0.334 | 0.313 | 0.307 |
| K = 340 | 0.185 | 0.410 | 0.474 | 0.511 | 0.527 | 0.544 | 0.537 | 0.544 | 0.538 | 0.532 |
| K = 450 | 0.220 | 0.512 | 0.588 | 0.624 | 0.640 | 0.660 | 0.676 | 0.676 | 0.675 | 0.676 |
| K = 560 | 0.228 | 0.551 | 0.634 | 0.686 | 0.725 | 0.731 | 0.750 | 0.757 | 0.765 | 0.774 |
| K = 670 | 0.237 | 0.624 | 0.690 | 0.738 | 0.764 | 0.777 | 0.786 | 0.797 | 0.809 | 0.812 |
| K = 780 | 0.250 | 0.644 | 0.738 | 0.769 | 0.786 | 0.797 | 0.810 | 0.815 | 0.824 | 0.826 |
| K = 890 | 0.269 | 0.660 | 0.778 | 0.796 | 0.816 | 0.832 | 0.847 | 0.852 | 0.862 | 0.867 |
| K = 1000 | 0.268 | 0.694 | 0.796 | 0.835 | 0.859 | 0.865 | 0.874 | 0.878 | 0.886 | 0.893 |
scaledSVAR <- data.frame() %>% tbl_df()
# Read in data
for(i in 1:nsim){
ValSim <- read.table(file = paste0(datLoc, 'uni_beta_vector_simID_', i, '_SCEN_', SCEN, '.txt'),
col.names = c("beta", "sVariance", "K", "nRSloops", "TrueBeta"), header = TRUE) %>%
tbl_df()
# CI using mean sandwich variance of beta with scaling
svarScale_tmp <- ValSim %>% group_by(K, nRSloops, TrueBeta) %>%
summarise(AvgBeta = mean(beta, na.rm = TRUE),
AvgSvarBeta = mean(sVariance, na.rm = TRUE)) %>%
mutate(CIlow = AvgBeta - (qnorm(0.025, lower.tail = FALSE) *
sqrt(AvgSvarBeta) * sqrt(K/n)),
CIup = AvgBeta + (qnorm(0.025, lower.tail = FALSE) *
sqrt(AvgSvarBeta) * sqrt(K/n)),
type = 'scaledSVAR') %>%
ungroup() %>% rowwise() %>%
mutate(coverage_ind = ifelse(TrueBeta >= CIlow && TrueBeta <= CIup, 1, 0),
sim = i)
# Collect the values over all simulations
scaledSVAR <- bind_rows(scaledSVAR,svarScale_tmp)
}
scaledSVARtable_tmp <- scaledSVAR %>% group_by(K,nRSloops) %>% summarise(EC = mean(coverage_ind))
scaledSVARtable <- matrix(scaledSVARtable_tmp$EC, ncol = length(nRSloops_vec), byrow = TRUE)
colnames(scaledSVARtable) <- paste('B = ', nRSloops_vec, sep = '')
rownames(scaledSVARtable) <- paste('K = ', K_vec, sep = '')
knitr::kable(scaledSVARtable)
| B = 10 | B = 120 | B = 230 | B = 340 | B = 450 | B = 560 | B = 670 | B = 780 | B = 890 | B = 1000 | |
|---|---|---|---|---|---|---|---|---|---|---|
| K = 10 | 0.006 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| K = 120 | 0.092 | 0.164 | 0.142 | 0.106 | 0.088 | 0.061 | 0.047 | 0.040 | 0.034 | 0.026 |
| K = 230 | 0.131 | 0.334 | 0.352 | 0.345 | 0.338 | 0.332 | 0.326 | 0.324 | 0.304 | 0.303 |
| K = 340 | 0.191 | 0.407 | 0.465 | 0.502 | 0.516 | 0.533 | 0.526 | 0.532 | 0.526 | 0.523 |
| K = 450 | 0.218 | 0.519 | 0.584 | 0.623 | 0.628 | 0.656 | 0.668 | 0.674 | 0.668 | 0.670 |
| K = 560 | 0.230 | 0.550 | 0.630 | 0.679 | 0.727 | 0.721 | 0.753 | 0.750 | 0.761 | 0.773 |
| K = 670 | 0.254 | 0.617 | 0.689 | 0.734 | 0.760 | 0.771 | 0.782 | 0.792 | 0.809 | 0.813 |
| K = 780 | 0.270 | 0.639 | 0.738 | 0.766 | 0.784 | 0.801 | 0.810 | 0.809 | 0.824 | 0.828 |
| K = 890 | 0.281 | 0.666 | 0.773 | 0.801 | 0.820 | 0.832 | 0.844 | 0.855 | 0.860 | 0.866 |
| K = 1000 | 0.273 | 0.695 | 0.800 | 0.826 | 0.858 | 0.867 | 0.871 | 0.873 | 0.886 | 0.887 |
After talking with Jan, it should really be:
\[
\left[ \widehat{\beta_{final}} \pm z_{1-\alpha/2} \times \sqrt{\frac{1}{B^2} \sum_{i=1}^B \widehat{\text{Var}(\beta_i)}} \right]
\] As you have B parts of K datapoints. To get \(\widehat{\beta}_{final}\), which is the end estimate, you take the mean of the B estimates. Hence you sum and divide by \(B\). The variance of a constant times the sum equals the squared constant times the sum of the variance. If, the B parts are independent!
However, in our subsampling scheme, this is not the case! So I expect some worse coverages here, but a better coverage in the bag of m-ou-of-n bootstrap procedure.
SummedSvar <- data.frame() %>% tbl_df()
# Read in data
for(i in 1:nsim){
ValSim <- read.table(file = paste0(datLoc, 'uni_beta_vector_simID_', i, '_SCEN_', SCEN, '.txt'),
col.names = c("beta", "sVariance", "K", "nRSloops", "TrueBeta"), header = TRUE) %>%
tbl_df()
# CI through now summing sandwich variance of beta
SummedSvar_tmp <- ValSim %>% group_by(K, nRSloops, TrueBeta) %>%
summarise(AvgBeta = mean(beta, na.rm = TRUE),
SummedSvar = sum(sVariance, na.rm = TRUE)) %>%
mutate(bVar = SummedSvar * (1/nRSloops^2),
CIlow = AvgBeta - (qnorm(0.025, lower.tail = FALSE) *
sqrt(bVar)),
CIup = AvgBeta + (qnorm(0.025, lower.tail = FALSE) *
sqrt(bVar)),
type = 'avSVar', sim = i,
coverage_ind =
ifelse(TrueBeta >= CIlow && TrueBeta <= CIup, 1, 0))
# Collect the values over all simulations
SummedSvar <- bind_rows(SummedSvar,SummedSvar_tmp)
}
SummedSvartable_tmp <- SummedSvar %>% group_by(K,nRSloops) %>% summarise(EC = mean(coverage_ind))
SummedSvartable <- matrix(SummedSvartable_tmp$EC, ncol = length(nRSloops_vec), byrow = TRUE)
colnames(SummedSvartable) <- paste('B = ', nRSloops_vec, sep = '')
rownames(SummedSvartable) <- paste('K = ', K_vec, sep = '')
knitr::kable(SummedSvartable)
| B = 10 | B = 120 | B = 230 | B = 340 | B = 450 | B = 560 | B = 670 | B = 780 | B = 890 | B = 1000 | |
|---|---|---|---|---|---|---|---|---|---|---|
| K = 10 | 0.458 | 0.001 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| K = 120 | 0.923 | 0.692 | 0.531 | 0.403 | 0.304 | 0.233 | 0.178 | 0.137 | 0.107 | 0.080 |
| K = 230 | 0.934 | 0.833 | 0.731 | 0.661 | 0.568 | 0.493 | 0.437 | 0.389 | 0.344 | 0.315 |
| K = 340 | 0.945 | 0.852 | 0.774 | 0.703 | 0.653 | 0.615 | 0.546 | 0.513 | 0.470 | 0.422 |
| K = 450 | 0.951 | 0.876 | 0.807 | 0.745 | 0.695 | 0.654 | 0.621 | 0.567 | 0.539 | 0.513 |
| K = 560 | 0.940 | 0.882 | 0.815 | 0.751 | 0.724 | 0.671 | 0.662 | 0.614 | 0.590 | 0.564 |
| K = 670 | 0.942 | 0.888 | 0.810 | 0.761 | 0.716 | 0.673 | 0.637 | 0.621 | 0.591 | 0.560 |
| K = 780 | 0.939 | 0.865 | 0.804 | 0.752 | 0.720 | 0.668 | 0.638 | 0.610 | 0.584 | 0.575 |
| K = 890 | 0.950 | 0.864 | 0.809 | 0.743 | 0.702 | 0.659 | 0.648 | 0.617 | 0.580 | 0.568 |
| K = 1000 | 0.935 | 0.880 | 0.815 | 0.764 | 0.714 | 0.678 | 0.642 | 0.614 | 0.581 | 0.558 |
Parameters:
u <- 10
alpha <- 10
sigma <- 5
trueBeta <- alpha/(sqrt(2) * sigma)
# Model (called SCEN in script)
SCEN <- 2
Start with estimated \(\beta\):
# load in estimated beta values
for(i in 1:nsim){
if(i %in% progress) print(paste0("At ", i/nsim*100, "%"))
# Values in one simulation: estimated beta values averaged within the sampling algorithm
ValSim <- try(read.table(file = paste0(datLoc, 'uni_beta_vector_simID_', i, '_SCEN_', SCEN, '.txt'),
col.names = c("beta", "sVariance", "K", "nRSloops", "TrueBeta"),
header = TRUE) %>% tbl_df() %>%
group_by(K, nRSloops) %>% summarise(avBeta = mean(beta)) %>%
mutate(sim = i), silent = TRUE)
if(class(ValSim) == "try-error"){
print(i);next
}
# Add to data frame with all simulations
valuesAllSim <- bind_rows(valuesAllSim, ValSim)
}
Same for computational time:
# load in computational time
for(i in 1:nsim){
if(i %in% progress) print(paste0("At ", i/nsim*100, "%"))
# Values in one simulation: computational time
TimeSim <- try(read.table(file = paste0(datLoc, 'uni_time_vector_simID_', i, '_SCEN_', SCEN, '.txt'),
col.names = c("minutes"), header = TRUE) %>% tbl_df() %>%
bind_cols(., pairs) %>%
mutate(sim = i), silent = TRUE)
if(class(TimeSim) == "try-error"){
print(i);next
}
# Add to data frame with all simulations
timeAllSim <- bind_rows(timeAllSim, TimeSim)
}
Quick summary:
tbl_df(valuesAllSim)
# A tibble: 49,900 × 4
K nRSloops avBeta sim
<int> <int> <dbl> <int>
1 10 10 1.037945 1
2 10 120 1.173419 1
3 10 230 1.113818 1
4 10 340 1.119661 1
5 10 450 1.109486 1
6 10 560 1.109550 1
7 10 670 1.107620 1
8 10 780 1.086124 1
9 10 890 1.089851 1
10 10 1000 1.090058 1
# ... with 49,890 more rows
group_by(valuesAllSim, K, nRSloops) %>% summarise(avBeta = mean(avBeta)) %>% summary()
K nRSloops avBeta
Min. : 10 Min. : 10 Min. :0.8400
1st Qu.: 230 1st Qu.: 230 1st Qu.:0.8407
Median : 505 Median : 505 Median :0.8424
Mean : 505 Mean : 505 Mean :0.8693
3rd Qu.: 780 3rd Qu.: 780 3rd Qu.:0.8478
Max. :1000 Max. :1000 Max. :1.1006
Plotting the estimated \(\beta\) parameters for each simulation through facets on the number of iterations (B).
Plotting the computational time.
These data generating parameters do not work!
Parameters:
alpha_1 <- -0.43
# Sigma based on dataset
sigma <- 9.51
trueBeta <- alpha_1/(sqrt(2) * sigma)
# Model (called SCEN in script)
SCEN <- 3
Start with estimated \(\beta\):
S3_colnames <- c("beta.X_smartph_hrs", "beta.X_sex",
"beta.X_econArea", "beta.X_ethnic",
"sVariance.X_smartph_hrs", "sVariance.X_sex",
"sVariance.X_econArea",
"sVariance.X_ethnic", "K", "nRSloops", "TrueBeta")
# load in estimated beta values
for(i in 1:nsim){
if(i %in% progress) print(paste0("At ", i/nsim*100, "%"))
# Values in one simulation: estimated beta values averaged within the sampling algorithm
ValSim <- read.table(file = paste0(datLoc, 'uni_beta_vector_simID_', i, '_SCEN_3.txt'),
col.names = S3_colnames, header = TRUE) %>% tbl_df() %>%
group_by(K, nRSloops) %>% summarise_all(mean) %>%
mutate(sim = i)
if(class(ValSim) == "try-error"){
print(i);next
}
# Add to data frame with all simulations
valuesAllSim <- bind_rows(valuesAllSim, ValSim)
}
Same for computational time:
# load in computational time
for(i in 1:nsim){
if(i %in% progress) print(paste0("At ", i/nsim*100, "%"))
# Values in one simulation: computational time
TimeSim <- read.table(file = paste0(datLoc, 'uni_time_vector_simID_', i, '_SCEN_', SCEN, '.txt'),
col.names = c("minutes"), header = TRUE) %>% tbl_df() %>%
bind_cols(., pairs) %>%
mutate(sim = i)
# Add to data frame with all simulations
timeAllSim <- bind_rows(timeAllSim, TimeSim)
}
Quick summary:
group_by(valuesAllSim, K, nRSloops) %>% summarise(avBeta = mean(beta.X_smartph_hrs)) %>% summary()
K nRSloops avBeta
Min. : 10 Min. : 10 Min. :-0.05688
1st Qu.: 230 1st Qu.: 230 1st Qu.:-0.03234
Median : 505 Median : 505 Median :-0.03220
Mean : 505 Mean : 505 Mean :-0.03465
3rd Qu.: 780 3rd Qu.: 780 3rd Qu.:-0.03212
Max. :1000 Max. :1000 Max. :-0.03193
Plotting the estimated \(\beta\) parameters for each simulation through facetting on the number of iterations (B).
MSEtable_tmp <- group_by(valuesAllSim, K, nRSloops) %>% mutate(SE = (beta.X_smartph_hrs - TrueBeta)^2) %>%
summarise(MSE = mean(SE, na.rm = TRUE))
MSEtable <- matrix(MSEtable_tmp$MSE, ncol = length(nRSloops_vec), byrow = TRUE)
colnames(MSEtable) <- paste('B = ', nRSloops_vec, sep = '')
rownames(MSEtable) <- paste('K = ', K_vec, sep = '')
options( scipen = 6 )
knitr::kable(MSEtable)
| B = 10 | B = 120 | B = 230 | B = 340 | B = 450 | B = 560 | B = 670 | B = 780 | B = 890 | B = 1000 | |
|---|---|---|---|---|---|---|---|---|---|---|
| K = 10 | 0.0093666 | 0.0013358 | 0.0010234 | 0.0008958 | 0.0008330 | 0.0007854 | 0.0007513 | 0.0007282 | 0.0007313 | 0.0007220 |
| K = 120 | 0.0000887 | 0.0000088 | 0.0000051 | 0.0000037 | 0.0000030 | 0.0000027 | 0.0000024 | 0.0000022 | 0.0000021 | 0.0000020 |
| K = 230 | 0.0000474 | 0.0000045 | 0.0000025 | 0.0000019 | 0.0000016 | 0.0000013 | 0.0000012 | 0.0000011 | 0.0000010 | 0.0000010 |
| K = 340 | 0.0000312 | 0.0000027 | 0.0000018 | 0.0000014 | 0.0000012 | 0.0000010 | 0.0000010 | 0.0000009 | 0.0000009 | 0.0000009 |
| K = 450 | 0.0000243 | 0.0000024 | 0.0000015 | 0.0000012 | 0.0000010 | 0.0000009 | 0.0000009 | 0.0000008 | 0.0000008 | 0.0000007 |
| K = 560 | 0.0000182 | 0.0000020 | 0.0000012 | 0.0000009 | 0.0000009 | 0.0000008 | 0.0000008 | 0.0000007 | 0.0000007 | 0.0000006 |
| K = 670 | 0.0000154 | 0.0000018 | 0.0000012 | 0.0000010 | 0.0000009 | 0.0000008 | 0.0000007 | 0.0000007 | 0.0000006 | 0.0000006 |
| K = 780 | 0.0000136 | 0.0000015 | 0.0000010 | 0.0000008 | 0.0000007 | 0.0000006 | 0.0000006 | 0.0000006 | 0.0000006 | 0.0000006 |
| K = 890 | 0.0000123 | 0.0000012 | 0.0000009 | 0.0000007 | 0.0000007 | 0.0000006 | 0.0000006 | 0.0000006 | 0.0000006 | 0.0000006 |
| K = 1000 | 0.0000105 | 0.0000012 | 0.0000008 | 0.0000007 | 0.0000006 | 0.0000006 | 0.0000006 | 0.0000005 | 0.0000005 | 0.0000005 |
Plotting the computational time.
for(j in 1:length(nRSloops_vec)){
K_temp <- K_vec[j]
B_temp <- nRSloops_vec[j]
assign(paste0("Plot", j),
valuesAllSim %>% filter(K == K_temp & nRSloops == B_temp) %>%
select(beta.X_smartph_hrs) %>% unlist(.) %>%
as.numeric() %>% gg_qq(x = ., title = paste0('K = ', K_temp, ' and B = ', B_temp))
)
}
cowplot::plot_grid(Plot1, Plot2, Plot3, Plot4, Plot5, Plot6, labels = c("A", "B", "C", "D", "E", "F"), ncol = 3)
cowplot::plot_grid(Plot7, Plot8, Plot9, Plot10, labels = c("G", "H", "I", "J"), ncol = 3)
Same procedure as model 1:
# Detect and start the workers
P <- detectCores(logical = FALSE) # physical cores
cl <- makeCluster(P)
# Initialize them with the libraries
clusterEvalQ(cl, library(tidyverse))
clusterEvalQ(cl, library(boot))
# Run in parallel and append results
boot_par_results <- clusterApply(cl, 1:nsim, fun = BootParCI, datLoc = datLoc, SCEN = SCEN)
boot_CI_coverage <- do.call(rbind, boot_par_results)
# Stop the workers
stopCluster(cl)
| B = 10 | B = 120 | B = 230 | B = 340 | B = 450 | B = 560 | B = 670 | B = 780 | B = 890 | B = 1000 | |
|---|---|---|---|---|---|---|---|---|---|---|
| K = 10 | 0.856 | 0.841 | 0.737 | 0.647 | 0.569 | 0.509 | 0.451 | 0.393 | 0.320 | 0.285 |
| K = 120 | 0.905 | 0.928 | 0.916 | 0.898 | 0.886 | 0.872 | 0.850 | 0.834 | 0.822 | 0.806 |
| K = 230 | 0.899 | 0.936 | 0.921 | 0.900 | 0.897 | 0.874 | 0.881 | 0.855 | 0.835 | 0.819 |
| K = 340 | 0.905 | 0.934 | 0.906 | 0.875 | 0.854 | 0.855 | 0.817 | 0.785 | 0.772 | 0.762 |
| K = 450 | 0.894 | 0.922 | 0.879 | 0.860 | 0.824 | 0.804 | 0.780 | 0.766 | 0.745 | 0.727 |
| K = 560 | 0.901 | 0.913 | 0.893 | 0.850 | 0.815 | 0.803 | 0.775 | 0.750 | 0.722 | 0.723 |
| K = 670 | 0.901 | 0.912 | 0.870 | 0.802 | 0.779 | 0.751 | 0.729 | 0.707 | 0.689 | 0.673 |
| K = 780 | 0.891 | 0.910 | 0.855 | 0.827 | 0.788 | 0.766 | 0.745 | 0.705 | 0.667 | 0.636 |
| K = 890 | 0.878 | 0.918 | 0.863 | 0.808 | 0.764 | 0.734 | 0.686 | 0.671 | 0.639 | 0.616 |
| K = 1000 | 0.886 | 0.876 | 0.851 | 0.776 | 0.733 | 0.704 | 0.693 | 0.666 | 0.633 | 0.614 |
Somewhat better.
norm_tAvg <- c()
# Read in data (again, sigh)
for(i in 1:nsim){
ValSim <- read.table(file = paste0(datLoc, 'uni_beta_vector_simID_', i, '_SCEN_', SCEN, '.txt'),
col.names = S3_colnames, header = TRUE) %>%
tbl_df()
ValSim <- rename(ValSim, beta = beta.X_smartph_hrs)
# CI using SD of data and normal quantiles and using t-quantiles
# Formula = sd/sqrt(n) for some strange reason (I expected sd of beta being equal to se)
norm_tmp <- ValSim %>% group_by(K, nRSloops) %>%
mutate(
CIlow = beta - (qnorm(0.025, lower.tail = FALSE) * (sd(beta)/(sqrt(K - 1)))),
CIup = beta + (qnorm(0.025, lower.tail = FALSE) * (sd(beta)/(sqrt(K - 1)))),
type = 'norm') %>% group_by(K, nRSloops, TrueBeta, type) %>%
summarise(AvgBeta = mean(beta, na.rm = TRUE),
AvgLow = mean(CIlow, na.rm = TRUE),
AvgUp = mean(CIup, na.rm = TRUE)) %>% ungroup() %>% rowwise() %>%
mutate(coverage_ind = ifelse(TrueBeta >= AvgLow && TrueBeta <= AvgUp, 1, 0),
sim = i)
t_tmp <- ValSim %>% group_by(K, nRSloops) %>%
mutate(
CIlow = beta - (qt(0.025, df = (K - 1), lower.tail = FALSE) * (sd(beta)/(sqrt(K - 1)))),
CIup = beta + (qt(0.025, df = (K - 1), lower.tail = FALSE) * (sd(beta)/(sqrt(K - 1)))),
type = 't') %>% group_by(K, nRSloops, TrueBeta, type) %>%
summarise(AvgBeta = mean(beta, na.rm = TRUE),
AvgLow = mean(CIlow, na.rm = TRUE),
AvgUp = mean(CIup, na.rm = TRUE)) %>% ungroup() %>% rowwise() %>%
mutate(coverage_ind = ifelse(TrueBeta >= AvgLow && TrueBeta <= AvgUp, 1, 0),
sim = i)
# Construct general CI by averaging endpoints
norm_tAvg <- bind_rows(norm_tAvg,norm_tmp, t_tmp)
}
# Plot
norm_tAvg %>% filter(type == 't' & K == 120 & nRSloops == 120) %>%
slice(round(seq(1,nsim,length.out = 50),0)) %>%
ggplot(., aes(x = TrueBeta)) + geom_vline(aes(xintercept = TrueBeta)) +
geom_point(aes(x = AvgBeta, y = sim)) +
geom_segment(aes(x = AvgLow, xend = AvgUp, y = sim, yend = sim)) +
scale_x_continuous("beta") + scale_y_continuous("simulation") +
ggtitle("95% CI around beta PIM estimate")
resTable_tmp <- norm_tAvg %>% filter(type == 't') %>% group_by(K,nRSloops) %>% summarise(EC = mean(coverage_ind))
resTable <- matrix(resTable_tmp$EC, ncol = length(nRSloops_vec), byrow = TRUE)
colnames(resTable) <- paste('B = ', nRSloops_vec, sep = '')
rownames(resTable) <- paste('K = ', K_vec, sep = '')
knitr::kable(resTable)
| B = 10 | B = 120 | B = 230 | B = 340 | B = 450 | B = 560 | B = 670 | B = 780 | B = 890 | B = 1000 | |
|---|---|---|---|---|---|---|---|---|---|---|
| K = 10 | 0.959 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| K = 120 | 0.439 | 0.934 | 0.985 | 0.996 | 1.000 | 0.999 | 1.000 | 1.000 | 1.000 | 1.000 |
| K = 230 | 0.312 | 0.804 | 0.920 | 0.959 | 0.979 | 0.985 | 0.991 | 0.995 | 0.992 | 0.995 |
| K = 340 | 0.244 | 0.738 | 0.839 | 0.881 | 0.919 | 0.936 | 0.946 | 0.955 | 0.960 | 0.953 |
| K = 450 | 0.214 | 0.629 | 0.759 | 0.806 | 0.822 | 0.855 | 0.876 | 0.879 | 0.890 | 0.900 |
| K = 560 | 0.211 | 0.571 | 0.693 | 0.756 | 0.774 | 0.802 | 0.807 | 0.821 | 0.832 | 0.840 |
| K = 670 | 0.178 | 0.503 | 0.627 | 0.665 | 0.683 | 0.721 | 0.732 | 0.753 | 0.764 | 0.766 |
| K = 780 | 0.174 | 0.493 | 0.600 | 0.634 | 0.660 | 0.678 | 0.703 | 0.706 | 0.702 | 0.713 |
| K = 890 | 0.137 | 0.480 | 0.544 | 0.582 | 0.618 | 0.628 | 0.627 | 0.643 | 0.638 | 0.636 |
| K = 1000 | 0.170 | 0.427 | 0.507 | 0.522 | 0.542 | 0.575 | 0.597 | 0.601 | 0.600 | 0.604 |
Again, scale the CI according to bootstrap m-out-of n theory.
scaledSD <- data.frame() %>% tbl_df()
# Read in data (again, sigh)
for(i in 1:nsim){
ValSim <- read.table(file = paste0(datLoc, 'uni_beta_vector_simID_', i, '_SCEN_', SCEN, '.txt'),
col.names = S3_colnames, header = TRUE) %>%
tbl_df()
ValSim <- rename(ValSim, beta = beta.X_smartph_hrs)
# CI using SD of beta with scaling
sdScale_tmp <- ValSim %>% group_by(K, nRSloops, TrueBeta) %>%
summarise(AvgBeta = mean(beta, na.rm = TRUE),
sdBeta = sd(beta, na.rm = TRUE)) %>%
mutate(CIlow = AvgBeta - (qnorm(0.025, lower.tail = FALSE) * sdBeta * sqrt(K/n)),
CIup = AvgBeta + (qnorm(0.025, lower.tail = FALSE) * sdBeta * sqrt(K/n)),
type = 'scaledSD') %>%
ungroup() %>% rowwise() %>%
mutate(coverage_ind = ifelse(TrueBeta >= CIlow && TrueBeta <= CIup, 1, 0),
sim = i)
# Collect the values over all simulations
scaledSD <- bind_rows(scaledSD,sdScale_tmp)
}
scaledSDTable_tmp <- scaledSD %>% group_by(K,nRSloops) %>% summarise(EC = mean(coverage_ind))
scaledSDTable <- matrix(scaledSDTable_tmp$EC, ncol = length(nRSloops_vec), byrow = TRUE)
colnames(scaledSDTable) <- paste('B = ', nRSloops_vec, sep = '')
rownames(scaledSDTable) <- paste('K = ', K_vec, sep = '')
knitr::kable(scaledSDTable)
| B = 10 | B = 120 | B = 230 | B = 340 | B = 450 | B = 560 | B = 670 | B = 780 | B = 890 | B = 1000 | |
|---|---|---|---|---|---|---|---|---|---|---|
| K = 10 | 0.021 | 0.079 | 0.059 | 0.057 | 0.048 | 0.044 | 0.028 | 0.033 | 0.025 | 0.021 |
| K = 120 | 0.124 | 0.320 | 0.440 | 0.505 | 0.561 | 0.584 | 0.595 | 0.613 | 0.616 | 0.622 |
| K = 230 | 0.152 | 0.447 | 0.565 | 0.647 | 0.676 | 0.737 | 0.744 | 0.776 | 0.785 | 0.803 |
| K = 340 | 0.164 | 0.552 | 0.652 | 0.712 | 0.753 | 0.785 | 0.789 | 0.796 | 0.817 | 0.825 |
| K = 450 | 0.189 | 0.567 | 0.703 | 0.755 | 0.788 | 0.808 | 0.829 | 0.829 | 0.846 | 0.854 |
| K = 560 | 0.241 | 0.621 | 0.748 | 0.809 | 0.817 | 0.849 | 0.857 | 0.865 | 0.871 | 0.879 |
| K = 670 | 0.235 | 0.652 | 0.762 | 0.794 | 0.823 | 0.842 | 0.860 | 0.864 | 0.882 | 0.873 |
| K = 780 | 0.257 | 0.706 | 0.790 | 0.833 | 0.871 | 0.876 | 0.885 | 0.888 | 0.900 | 0.903 |
| K = 890 | 0.263 | 0.716 | 0.826 | 0.845 | 0.876 | 0.889 | 0.898 | 0.907 | 0.916 | 0.920 |
| K = 1000 | 0.313 | 0.737 | 0.830 | 0.851 | 0.875 | 0.891 | 0.901 | 0.898 | 0.908 | 0.917 |
S3_colnames <- c("beta.X_smartph_hrs", "beta.X_sex",
"beta.X_econArea", "beta.X_ethnic",
"sVariance.X_smartph_hrs", "sVariance.X_sex",
"sVariance.X_econArea",
"sVariance.X_ethnic", "K", "nRSloops", "TrueBeta")
SummedSvar <- data.frame() %>% tbl_df()
# Read in data
for(i in 1:nsim){
ValSim <- read.table(file = paste0(datLoc, 'uni_beta_vector_simID_', i, '_SCEN_', SCEN, '.txt'),
col.names = S3_colnames, header = TRUE) %>%
tbl_df()
ValSim <- rename(ValSim, beta = beta.X_smartph_hrs, sVariance = sVariance.X_smartph_hrs)
# CI through now summing sandwich variance of beta
SummedSvar_tmp <- ValSim %>% group_by(K, nRSloops, TrueBeta) %>%
summarise(AvgBeta = mean(beta, na.rm = TRUE),
SummedSvar = sum(sVariance, na.rm = TRUE)) %>%
mutate(bVar = SummedSvar * (1/nRSloops^2),
CIlow = AvgBeta - (qnorm(0.025, lower.tail = FALSE) *
sqrt(bVar)),
CIup = AvgBeta + (qnorm(0.025, lower.tail = FALSE) *
sqrt(bVar)),
type = 'avSVar', sim = i,
coverage_ind =
ifelse(TrueBeta >= CIlow && TrueBeta <= CIup, 1, 0))
# Collect the values over all simulations
SummedSvar <- bind_rows(SummedSvar,SummedSvar_tmp)
}
SummedSvartable_tmp <- SummedSvar %>% group_by(K,nRSloops) %>% summarise(EC = mean(coverage_ind))
SummedSvartable <- matrix(SummedSvartable_tmp$EC, ncol = length(nRSloops_vec), byrow = TRUE)
colnames(SummedSvartable) <- paste('B = ', nRSloops_vec, sep = '')
rownames(SummedSvartable) <- paste('K = ', K_vec, sep = '')
knitr::kable(SummedSvartable)
| B = 10 | B = 120 | B = 230 | B = 340 | B = 450 | B = 560 | B = 670 | B = 780 | B = 890 | B = 1000 | |
|---|---|---|---|---|---|---|---|---|---|---|
| K = 10 | 0.638 | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| K = 120 | 0.945 | 0.930 | 0.908 | 0.896 | 0.873 | 0.853 | 0.836 | 0.824 | 0.802 | 0.789 |
| K = 230 | 0.945 | 0.932 | 0.915 | 0.896 | 0.888 | 0.873 | 0.871 | 0.845 | 0.824 | 0.811 |
| K = 340 | 0.948 | 0.939 | 0.911 | 0.877 | 0.850 | 0.846 | 0.815 | 0.781 | 0.773 | 0.763 |
| K = 450 | 0.942 | 0.917 | 0.876 | 0.856 | 0.821 | 0.804 | 0.781 | 0.754 | 0.738 | 0.718 |
| K = 560 | 0.952 | 0.917 | 0.893 | 0.852 | 0.815 | 0.799 | 0.772 | 0.740 | 0.719 | 0.714 |
| K = 670 | 0.951 | 0.907 | 0.864 | 0.815 | 0.779 | 0.754 | 0.726 | 0.706 | 0.682 | 0.667 |
| K = 780 | 0.946 | 0.911 | 0.854 | 0.823 | 0.781 | 0.770 | 0.735 | 0.700 | 0.673 | 0.644 |
| K = 890 | 0.937 | 0.919 | 0.871 | 0.808 | 0.768 | 0.725 | 0.686 | 0.668 | 0.635 | 0.614 |
| K = 1000 | 0.942 | 0.889 | 0.851 | 0.783 | 0.737 | 0.709 | 0.689 | 0.660 | 0.630 | 0.607 |
Parameters:
u <- 10
alpha <- 1
sigma <- 5
trueBeta <- alpha/(sqrt(2) * sigma)
# Model (called SCEN in script)
SCEN <- 4
Start with estimated \(\beta\):
# load in estimated beta values
for(i in 1:nsim){
if(i %in% progress) print(paste0("At ", i/nsim*100, "%"))
# Values in one simulation: estimated beta values averaged within the sampling algorithm
ValSim <- try(read.table(file = paste0(datLoc, 'uni_beta_vector_simID_', i, '_SCEN_', SCEN, '.txt'),
col.names = c("beta", "sVariance", "K", "nRSloops", "TrueBeta"),
header = TRUE) %>% tbl_df() %>%
group_by(K, nRSloops, TrueBeta) %>% summarise(avBeta = mean(beta)) %>%
mutate(sim = i), silent = TRUE)
if(class(ValSim)[1] == "try-error"){
print(i);next
}
# Add to data frame with all simulations
valuesAllSim <- bind_rows(valuesAllSim, ValSim)
}
Same for computational time:
# load in computational time
for(i in 1:nsim){
if(i %in% progress) print(paste0("At ", i/nsim*100, "%"))
# Values in one simulation: computational time
TimeSim <- try(read.table(file = paste0(datLoc, 'uni_time_vector_simID_', i, '_SCEN_', SCEN, '.txt'),
col.names = c("minutes"), header = TRUE) %>% tbl_df() %>%
bind_cols(., pairs) %>%
mutate(sim = i), silent = TRUE)
if(class(TimeSim)[1] == "try-error"){
print(i);next
}
# Add to data frame with all simulations
timeAllSim <- bind_rows(timeAllSim, TimeSim)
}
Quick summary:
tbl_df(valuesAllSim)
# A tibble: 100,000 × 5
K nRSloops TrueBeta avBeta sim
<int> <int> <dbl> <dbl> <int>
1 10 10 0.1414214 0.1332534 1
2 10 120 0.1414214 0.1598918 1
3 10 230 0.1414214 0.1691766 1
4 10 340 0.1414214 0.1729972 1
5 10 450 0.1414214 0.1733743 1
6 10 560 0.1414214 0.1748912 1
7 10 670 0.1414214 0.1727332 1
8 10 780 0.1414214 0.1755666 1
9 10 890 0.1414214 0.1776100 1
10 10 1000 0.1414214 0.1783243 1
# ... with 99,990 more rows
group_by(valuesAllSim, K, nRSloops) %>% summarise(avBeta = mean(avBeta)) %>% summary()
K nRSloops avBeta
Min. : 10 Min. : 10 Min. :0.1415
1st Qu.: 230 1st Qu.: 230 1st Qu.:0.1417
Median : 505 Median : 505 Median :0.1417
Mean : 505 Mean : 505 Mean :0.1450
3rd Qu.: 780 3rd Qu.: 780 3rd Qu.:0.1422
Max. :1000 Max. :1000 Max. :0.1742
Plotting the estimated \(\beta\) parameters for each simulation through facets on the number of iterations (B).
MSEtable_tmp <- group_by(valuesAllSim, K, nRSloops) %>% mutate(SE = (avBeta - TrueBeta)^2) %>%
summarise(MSE = mean(SE, na.rm = TRUE))
MSEtable <- matrix(MSEtable_tmp$MSE, ncol = length(nRSloops_vec), byrow = TRUE)
colnames(MSEtable) <- paste('B = ', nRSloops_vec, sep = '')
rownames(MSEtable) <- paste('K = ', K_vec, sep = '')
options( scipen = 6 )
knitr::kable(MSEtable)
| B = 10 | B = 120 | B = 230 | B = 340 | B = 450 | B = 560 | B = 670 | B = 780 | B = 890 | B = 1000 | |
|---|---|---|---|---|---|---|---|---|---|---|
| K = 10 | 0.0028661 | 0.0011504 | 0.0010975 | 0.0010447 | 0.0010452 | 0.0010334 | 0.0010264 | 0.0010227 | 0.0010144 | 0.0010101 |
| K = 120 | 0.0000726 | 0.0000078 | 0.0000057 | 0.0000046 | 0.0000044 | 0.0000040 | 0.0000038 | 0.0000037 | 0.0000035 | 0.0000034 |
| K = 230 | 0.0000344 | 0.0000038 | 0.0000025 | 0.0000020 | 0.0000018 | 0.0000016 | 0.0000015 | 0.0000014 | 0.0000014 | 0.0000013 |
| K = 340 | 0.0000230 | 0.0000025 | 0.0000016 | 0.0000013 | 0.0000011 | 0.0000010 | 0.0000010 | 0.0000009 | 0.0000009 | 0.0000008 |
| K = 450 | 0.0000165 | 0.0000017 | 0.0000011 | 0.0000009 | 0.0000008 | 0.0000008 | 0.0000007 | 0.0000007 | 0.0000006 | 0.0000006 |
| K = 560 | 0.0000137 | 0.0000017 | 0.0000010 | 0.0000009 | 0.0000007 | 0.0000007 | 0.0000006 | 0.0000006 | 0.0000006 | 0.0000005 |
| K = 670 | 0.0000115 | 0.0000013 | 0.0000008 | 0.0000007 | 0.0000006 | 0.0000006 | 0.0000006 | 0.0000005 | 0.0000005 | 0.0000005 |
| K = 780 | 0.0000097 | 0.0000011 | 0.0000008 | 0.0000007 | 0.0000006 | 0.0000006 | 0.0000005 | 0.0000005 | 0.0000005 | 0.0000005 |
| K = 890 | 0.0000092 | 0.0000010 | 0.0000007 | 0.0000006 | 0.0000006 | 0.0000005 | 0.0000005 | 0.0000005 | 0.0000005 | 0.0000004 |
| K = 1000 | 0.0000080 | 0.0000010 | 0.0000007 | 0.0000006 | 0.0000005 | 0.0000005 | 0.0000005 | 0.0000004 | 0.0000004 | 0.0000004 |
Plotting the computational time.
for(j in 1:length(nRSloops_vec)){
K_temp <- K_vec[j]
B_temp <- nRSloops_vec[j]
assign(paste0("Plot", j),
valuesAllSim %>% filter(K == K_temp & nRSloops == B_temp) %>%
select(avBeta) %>% unlist(.) %>%
as.numeric() %>% gg_qq(x = ., title = paste0('K = ', K_temp, ' and B = ', B_temp))
)
}
cowplot::plot_grid(Plot1, Plot2, Plot3, Plot4, Plot5, Plot6, labels = c("A", "B", "C", "D", "E", "F"), ncol = 3)
cowplot::plot_grid(Plot7, Plot8, Plot9, Plot10, labels = c("G", "H", "I", "J"), ncol = 3)
Again, scale the CI according to bootstrap m-out-of n theory.
scaledSD <- data.frame() %>% tbl_df()
# Read in data (again, sigh)
for(i in 1:nsim){
ValSim <- read.table(file = paste0(datLoc, 'uni_beta_vector_simID_', i, '_SCEN_', SCEN, '.txt'),
col.names = c("beta", "sVariance", "K", "nRSloops", "TrueBeta"),
header = TRUE) %>%
tbl_df()
# CI using SD of beta with scaling
sdScale_tmp <- ValSim %>% group_by(K, nRSloops, TrueBeta) %>%
summarise(AvgBeta = mean(beta, na.rm = TRUE),
sdBeta = sd(beta, na.rm = TRUE)) %>%
mutate(CIlow = AvgBeta - (qnorm(0.025, lower.tail = FALSE) * sdBeta * sqrt(K/n)),
CIup = AvgBeta + (qnorm(0.025, lower.tail = FALSE) * sdBeta * sqrt(K/n)),
type = 'scaledSD') %>%
ungroup() %>% rowwise() %>%
mutate(coverage_ind = ifelse(TrueBeta >= CIlow && TrueBeta <= CIup, 1, 0),
sim = i)
# Collect the values over all simulations
scaledSD <- bind_rows(scaledSD,sdScale_tmp)
}
scaledSDTable_tmp <- scaledSD %>% group_by(K,nRSloops) %>% summarise(EC = mean(coverage_ind))
scaledSDTable <- matrix(scaledSDTable_tmp$EC, ncol = length(nRSloops_vec), byrow = TRUE)
colnames(scaledSDTable) <- paste('B = ', nRSloops_vec, sep = '')
rownames(scaledSDTable) <- paste('K = ', K_vec, sep = '')
knitr::kable(scaledSDTable)
| B = 10 | B = 120 | B = 230 | B = 340 | B = 450 | B = 560 | B = 670 | B = 780 | B = 890 | B = 1000 | |
|---|---|---|---|---|---|---|---|---|---|---|
| K = 10 | 0.024 | 0.001 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| K = 120 | 0.104 | 0.277 | 0.313 | 0.352 | 0.334 | 0.344 | 0.323 | 0.328 | 0.329 | 0.328 |
| K = 230 | 0.143 | 0.416 | 0.496 | 0.520 | 0.549 | 0.591 | 0.593 | 0.611 | 0.615 | 0.629 |
| K = 340 | 0.153 | 0.495 | 0.621 | 0.653 | 0.692 | 0.716 | 0.726 | 0.734 | 0.741 | 0.745 |
| K = 450 | 0.196 | 0.585 | 0.685 | 0.733 | 0.756 | 0.779 | 0.794 | 0.814 | 0.818 | 0.813 |
| K = 560 | 0.204 | 0.575 | 0.720 | 0.743 | 0.798 | 0.816 | 0.835 | 0.840 | 0.852 | 0.858 |
| K = 670 | 0.245 | 0.674 | 0.759 | 0.804 | 0.819 | 0.845 | 0.861 | 0.863 | 0.869 | 0.879 |
| K = 780 | 0.252 | 0.679 | 0.777 | 0.806 | 0.827 | 0.843 | 0.866 | 0.869 | 0.875 | 0.874 |
| K = 890 | 0.282 | 0.704 | 0.787 | 0.826 | 0.843 | 0.852 | 0.878 | 0.878 | 0.881 | 0.884 |
| K = 1000 | 0.283 | 0.741 | 0.819 | 0.838 | 0.862 | 0.872 | 0.890 | 0.894 | 0.895 | 0.907 |
SummedSvar <- data.frame() %>% tbl_df()
# Read in data
for(i in 1:nsim){
ValSim <- read.table(file = paste0(datLoc, 'uni_beta_vector_simID_', i, '_SCEN_', SCEN, '.txt'),
col.names = c("beta", "sVariance", "K", "nRSloops", "TrueBeta"), header = TRUE) %>%
tbl_df()
# CI through now summing sandwich variance of beta
SummedSvar_tmp <- ValSim %>% group_by(K, nRSloops, TrueBeta) %>%
summarise(AvgBeta = mean(beta, na.rm = TRUE),
SummedSvar = sum(sVariance, na.rm = TRUE)) %>%
mutate(bVar = SummedSvar * (1/nRSloops^2),
CIlow = AvgBeta - (qnorm(0.025, lower.tail = FALSE) *
sqrt(bVar)),
CIup = AvgBeta + (qnorm(0.025, lower.tail = FALSE) *
sqrt(bVar)),
type = 'avSVar', sim = i,
coverage_ind =
ifelse(TrueBeta >= CIlow && TrueBeta <= CIup, 1, 0))
# Collect the values over all simulations
SummedSvar <- bind_rows(SummedSvar,SummedSvar_tmp)
}
SummedSvartable_tmp <- SummedSvar %>% group_by(K,nRSloops) %>% summarise(EC = mean(coverage_ind))
SummedSvartable <- matrix(SummedSvartable_tmp$EC, ncol = length(nRSloops_vec), byrow = TRUE)
colnames(SummedSvartable) <- paste('B = ', nRSloops_vec, sep = '')
rownames(SummedSvartable) <- paste('K = ', K_vec, sep = '')
knitr::kable(SummedSvartable)
| B = 10 | B = 120 | B = 230 | B = 340 | B = 450 | B = 560 | B = 670 | B = 780 | B = 890 | B = 1000 | |
|---|---|---|---|---|---|---|---|---|---|---|
| K = 10 | 0.679 | 0.075 | 0.008 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| K = 120 | 0.934 | 0.880 | 0.817 | 0.758 | 0.686 | 0.640 | 0.598 | 0.557 | 0.527 | 0.484 |
| K = 230 | 0.946 | 0.910 | 0.860 | 0.821 | 0.781 | 0.749 | 0.728 | 0.687 | 0.659 | 0.639 |
| K = 340 | 0.943 | 0.903 | 0.877 | 0.833 | 0.799 | 0.768 | 0.737 | 0.716 | 0.703 | 0.678 |
| K = 450 | 0.947 | 0.918 | 0.882 | 0.846 | 0.805 | 0.776 | 0.745 | 0.727 | 0.706 | 0.685 |
| K = 560 | 0.948 | 0.890 | 0.850 | 0.818 | 0.798 | 0.769 | 0.738 | 0.725 | 0.689 | 0.670 |
| K = 670 | 0.941 | 0.911 | 0.862 | 0.822 | 0.777 | 0.743 | 0.727 | 0.688 | 0.672 | 0.647 |
| K = 780 | 0.949 | 0.885 | 0.854 | 0.788 | 0.740 | 0.722 | 0.684 | 0.661 | 0.646 | 0.621 |
| K = 890 | 0.938 | 0.893 | 0.840 | 0.779 | 0.735 | 0.723 | 0.675 | 0.651 | 0.639 | 0.614 |
| K = 1000 | 0.944 | 0.886 | 0.834 | 0.767 | 0.716 | 0.688 | 0.652 | 0.632 | 0.618 | 0.599 |